Molecular & Cellular Proteomics — Latest Matching Preprints

1

Ground Truth-Based Evaluation of False Discovery Rate and Statistical Power in DIA Proteomics

Yarbro, J. M.; Huang, Y.; Pagala, V.; Fu, Y.; Wang, Z.; Wu, L.; Wang, X.; High, A. A.; Byrum, S.; Peng, J.; Yuan, Z.-F.

2026-06-02 bioinformatics 10.64898/2026.05.29.728747 medRxiv

Top 0.2%

11.8%

Show abstract

Data-independent acquisition (DIA) mass spectrometry enables rapid proteomic quantification, yet the reliability of statistical inference in DIA-based protein quantification remains incompletely understood. Here, we systematically evaluated missingness, false discovery rate (FDR), and statistical power, defined as true positive rate (i.e. sensitivity or recall), using technical replicates and a spike-in benchmark with known ground truth. Analysis of 18 HeLa replicates revealed persistent, abundance-dependent missingness. In the spike-in experiment with five replicates, human peptides were titrated against a stable yeast background, allowing fold changes (FCs) to be compared with expected values. Across comparisons with log2FCs ranging from 0.2 to 2.5, the nominal BH-FDR substantially underestimated the true FDR. For example, at a BH-FDR threshold of 0.05, the true FDR was [~]0.2. Statistical power was [~]40% for a log2FC of 0.2 and increased to nearly 100% for a log2FC of 2.5. Additional incorporation of FC thresholds improved the true FDR for large-FC comparisons, with slight loss of power, but markedly reduced sensitivity for small-FC comparisons. Together, these results indicate that nominal FDR does not necessarily reflect actual error rates in DIA proteomics and that DIA performance is influenced by protein abundance and expected fold changes. This study provides a framework for experimental design and data interpretation in DIA-based proteomic studies.

2

onsite: An Integrated Framework for Phosphosite Localization and False Localization Rate Estimation

Yue, Q.-X.; Wei, Z.; Dai, C.; Bai, M.; Perez-Riverol, Y.; Sachsenberg, T.

2026-07-11 bioinformatics 10.64898/2026.07.08.737157 medRxiv

Top 0.2%

11.8%

Show abstract

With the rapid development of mass spectrometry-based proteomics, the volume of phosphoproteomic data has increased substantially. However, accurate localization of phosphorylation sites and standardized statistical validation remain critical analytical bottlenecks. To address the lack of standardized cross-algorithm evaluation, we introduce onsite, a unified and open-source Python framework. onsite integrates an alanine-decoy strategy to estimate the false localization rate (FLR) across three algorithms: AScore, PhosphoRS, and pyLucXor. This modular architecture efficiently processes large-scale datasets and enables global FLR calculation. Benchmarking on the standard synthetic phosphopeptide dataset PXD000138 highlighted distinct inter-algorithmic variations. Using the same 5% global FLR threshold, pyLucXor localized the most target sites (28,353). It also reached a high accuracy (91.22%) against the known ground truth, resulting in the largest number of correctly localized sites (25,865). Reanalysis of the highly fractionated, large-scale PXD012255 dataset further demonstrated that native integration of onsite into the quantms pipeline enables scalable processing and provides a standardized framework for FLR control in large-scale phosphoproteomics. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=64 SRC="FIGDIR/small/737157v1_ufig1.gif" ALT="Figure 1"> View larger version (14K): org.highwire.dtl.DTLVardef@e4c85dorg.highwire.dtl.DTLVardef@1e8464org.highwire.dtl.DTLVardef@185cea1org.highwire.dtl.DTLVardef@1c0d1bc_HPS_FORMAT_FIGEXP M_FIG C_FIG

3

Real-time artificial intelligence prediction of peptide characteristics and MSFragger search improves multiplexed quantification of non-canonical HLA presented peptides in clear cell renal cell carcinoma.

Marcu, A.; Leskoske, K.; Yu, F.; Nesvizhskii, A.; Klaeger, S.; Rose, C. M.

2026-06-02 cancer biology 10.64898/2026.05.29.727942 medRxiv

Top 0.2%

10.0%

Show abstract

Non-canonical HLA-presented peptides are promising therapeutic targets, but their low abundance makes them difficult to reproducibly identify and quantify, particularly in multiplexed immunopeptidomics workflows. Here we present MIRA-MS (Model-Informed Real-time Acquisition for Mass Spectrometry), a real-time acquisition strategy that combines fragment ion-indexed database searching with artificial intelligence-based prediction of peptide fragmentation and retention time to guide quantitative scan acquisition. In a clear cell renal cell carcinoma model, MIRA-MS increased the number of quantified non-canonical immunopeptides by 97-107% relative to standard acquisition methods while also improving recovery of canonical peptides by 45-89%. These results establish real-time AI-guided acquisition as a powerful approach for deeper and more reproducible immunopeptidome profiling.

4

Low-input proteomics enables proteome and phosphoproteome-scale molecular phenotyping of separase-deficient oocytes

Touati, S.;Legros, V.;Boyer, J.;Cochard, V.;Chevreux, G.;Wassmann, K.

2026-06-21 Cell Biology 10.64898/2026.06.17.732846 medRxiv

Top 0.2%

9.9%

Show abstract

We performed a comprehensive quantitative proteomic analysis of mouse oocytes using as few as 40 oocytes per condition, comparing wild-type and separase knockout oocytes at metaphase I and metaphase II. To this end, we generated a deep proteomic library spanning oocyte cell cycle stages, enabling the identification of numerous phosphosites without phosphopeptide enrichment. We further combined data-dependent (DDA) and data-independent (DIA) acquisition strategies, analyzed through multiple software pipelines in both library-based and library-free modes. Our results reveal extensive proteome remodeling during the metaphase I to metaphase II transition in wild-type oocytes, consistent with dynamic regulation of meiotic processes. As a proof of concept for our workflow, we asked whether separase knockout oocytes--unable to separate chromosomes in meiosis I--progress into meiosis II. Direct comparison of wild-type and separase knockout oocytes at the metaphase II stage revealed minimal global differences, supporting the idea that both conditions converge toward a comparable metaphase II-like cellular state despite distinct chromosomal configurations. However, at a finer scale, specific alterations were detected among chromosome-associated proteins. Notably, Meikin was enriched in separase-deficient metaphase II oocytes, consistent with defective separase-dependent cleavage and subsequent turnover. More broadly, several proteins involved in chromosome organization displayed behavior similar to Meikin, suggesting that separase activity regulates multiple substrates to orchestrate chromosome segregation during female meiosis.

5

From Peaks to Power: Systematic Evaluation of Chromatographic Sampling Reveals Determinants of Quantification and Biological Discovery in DIA Proteomics

Cantrell, L. S.; Just, S.; Stukalov, A.; Farokhzad, O. C.; Batzoglou, S.

2026-05-16 bioinformatics 10.64898/2026.05.13.724964 medRxiv

Top 0.2%

9.8%

Show abstract

Modern DIA proteomics increasingly emphasizes throughput and depth for large-cohort studies, but methods are often optimized using proxy metrics that can mask losses in quantifiable signal and statistical power. Here, we evaluate how datapoints per peak and other chromatographic features jointly contribute to quantification and downstream biological discovery. Using a matrix-matched calibration curve dataset, we checked how the number of datapoints per peak (DPPP) affects the limits of detection and quantification (LOD/LOQ). Reduced DPPP minimally affected LOD but substantially degraded LOQ. Feature modeling and nonparametric association analyses identified precursor peak area as the strongest feature-level predictor of LOQ, whereas DPPP showed weaker and context-dependent effects. Simulations of chromatographic peak integration recapitulated these trends, showing that increased sampling primarily improves integration precision, while quantitative accuracy is strongly governed by peak height and peak shape. Finally, when comparing 20 cancer vs 20 control plasma samples processed with Seer Proteograph, the decrease in DPPP led to a loss of statistical significance for proteins with low-abundance precursors. These findings argue that DIA optimization should prioritize LOQ and statistical power metrics - not identifications alone - by balancing sampling density with chromatographic peak height and quality to maximize useful biological signal.

6

Reference-Based Library Construction Improves Performance in low-input diaPASEF Workflows

Charkow, J.; Ghaznavi, M.; Seale, B.; Peng, J.; Gingras, A.-C.; Rost, H.

2026-05-04 bioinformatics 10.64898/2026.04.29.721088 medRxiv

Top 0.2%

9.5%

Show abstract

In low input mass spectrometry-based proteomics, Data Independent Acquisition (DIA), including diaPASEF, is quickly becoming the method of choice for label free quantification. Whether using empirical or in silico spectral libraries, performance is dependent on the library; however, the optimal library construction strategy for low input proteomics remains an open question. To address this, we examine and develop library construction approaches that are compatible with both spectrum-centric and peptide-centric analysis workflows. These approaches leverage a closely related, high-quality sample to improve library quality. First, we validated our approach in bulk sample amounts where we observed that the effects of gas-phase fractionation based library construction is dependent on the software framework, with improvements more pronounced in OpenSWATH compared to DIA-NN. In OpenSWATH, our peptide-centric library reconstruction workflow consistently outperforms a transfer learning strategy, an emerging alternative approach. In DIA-NN, trends are dependent on library source highlighting OpenSWATHs stronger dependence on the search space. In low-input applications, such as single-cell-equivalent injection amounts (100 pg) of HeLa cell digest on a timsTOF SCP, our library construction approach provided more pronounced improvements across both software tools compared to bulk samples. Using a peptide-centric reconstruction approach with the OpenSWATH analysis framework, we detected over 15,000 peptide precursors (2480 protein groups), a 90% improvement over the original library. Furthermore, using a spectrum-centric construction approach, peptide precursor identification rates improved over 6-fold ([~]1000 to [~]6000). Our strategy provides a practical solution for generating high-quality libraries in low-input applications.

7

Complementary Single-Cell Microflow HILIC and Ion Pair LC-MS Reveal Bystander Metabolic Effects in a Macrophage Model of Tuberculosis

Cook, A.; Deshpande, R.; Ellis, A. E.; Sheldon, R.; Davison, C.; Pascoe, J.; Bird, S.; Beste, D. J.; Bailey, M.

2026-06-23 microbiology 10.64898/2026.06.22.733771 medRxiv

Top 0.3%

8.2%

Show abstract

Single-cell metabolomics remains analytically challenging due to the low abundance and chemical diversity of metabolites in individual cells. We have developed complementary microflow HILIC and ion pair LC-MS methods to expand metabolite coverage in single macrophages. Ion pair LC-MS was applied to single cells for the first time, enabling retention of highly polar and ionic metabolites that elute early under conventional reversed-phase conditions. Across Mycobacterium bovis BCG infected, uninfected bystander, and control unexposed THP-1 macrophages, both microflow methods detected significantly more features than a previously reported analytical-flow HILIC method. The two microflow methods provided complementary chemical space, together yielding 633 unique named metabolites with MS2 spectra. This depth enabled pathway-level interpretation at single-cell resolution, revealing infection-associated changes in purine-, arginine-, glutathione-, and one-carbon folate-associated metabolism. Metabolite-level interrogation indicated shared purine and amino acid changes in both infected and neighbouring macrophages, while revealing a distinct bystander phenotype characterised by elevated glycine and heterogeneous ATP levels. Finally, we demonstrate sequential IP and HILIC analysis of the same single cell, establishing a route toward maximal coverage from individual cells. These results position microflow HILIC and IP LC-MS as powerful, orthogonal strategies for advancing single-cell metabolomics and unveiling heterogeneity within complex biological microenvironments. Table of Contents O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=78 SRC="FIGDIR/small/733771v1_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@d0d02dorg.highwire.dtl.DTLVardef@11364d5org.highwire.dtl.DTLVardef@40e1a1org.highwire.dtl.DTLVardef@19d24a5_HPS_FORMAT_FIGEXP M_FIG Figure made in BioRender. C_FIG

8

Community Resource: A Genome-Based Extension of Large-Scale Wheat Proteogenomics

Vincent, D.; Appels, R.

2026-07-08 plant biology 10.64898/2026.06.17.733048 medRxiv

Top 0.3%

8.1%

Show abstract

Bread wheat (Triticum aestivum L.) possesses a large and highly repetitive allohexaploid genome and annotation requires extensive protein-level validation. We developed a genome-based wheat proteogenomics workflow integrating large-scale MS/MS reanalysis, GFF3-based peptide coordinate reconstruction, thorough validation, and genome browser-compatible peptide deployment against the IWGSC RefSeq v2.1 reference genome. Public wheat proteomics datasets comprising 577 raw mass spectrometry files ([~]1.0 TB) from 32 tissues were reprocessed using FragPipe/MSFragger, generating 2,226,779 non-redundant peptides and 1,648,740 unique protein accessions. Peptide-to-genome projections using GFF3 annotation files produced 8,291,056 genomic peptide projected rows, of which 98.14% passed validation procedures. Overall, peptide evidence supported 103,095 high-confidence (HC) and 135,495 low-confidence (LC) wheat gene models, corresponding to 96.4% and 84.7% of all parsed HC and LC annotations, respectively. In total, 238,590 wheat gene models (89.4% of all parsed annotations) received protein-level support. Apollo/JBrowse-compatible BED tracks enabled exon-resolved visualisation of peptide evidence across wheat chromosomes. Together, this study establishes a scalable GFF3-based proteogenomics framework for complex polyploid plant genomes and provides an extensive community resource for wheat genome annotation refinement and visual exploration (https://bread-wheat-um.genome.edu.au/apollo/49826/jbrowse/index.html). Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=63 SRC="FIGDIR/small/733048v2_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@6e797org.highwire.dtl.DTLVardef@14ea4fdorg.highwire.dtl.DTLVardef@31f027org.highwire.dtl.DTLVardef@8d908a_HPS_FORMAT_FIGEXP M_FIG C_FIG

9

SILAC-Site: A streamlined workflow for determining phosphopeptide stoichiometries

Faisst, K. D.; Sinn, L. R.; Lau, K.; Szyrwiel, L.; Rappsilber, J.; Demichev, V.

2026-06-01 biochemistry 10.64898/2026.05.28.728441 medRxiv

Top 0.3%

8.1%

Show abstract

Mass spectrometry-based proteomics enables in-depth investigation of protein phosphorylation, quantifying tens of thousands of phosphosites per sample following phosphopeptide enrichment. However, critical information on phosphosite occupancy (stoichiometry) is typically lost during the phospho-enrichment process. Here, we introduce SILAC-Site, a SILAC-based fractionation-free and chemical labelling-free workflow for direct phosphosite stoichiometry evaluation using stable isotope labeling and phosphatase treatment. By acquiring treated or untreated peptides together with their heavy-labelled dephosphorylated counterparts within the same LC-MS runs, this approach provides internally controlled stoichiometry estimates compatible with high-throughput data-independent acquisition proteomics. Applying SILAC-Site to S. cerevisiae, we show that the majority of phosphopeptides identified only after enrichment possess low stoichiometries, and that inferred stoichiometry strongly correlates with the direct detection of phosphopeptides in samples without enrichment. Based on these findings, we propose the analysis of samples without enrichment as a simple complementary addition to a typical phosphoproteomics workflow, facilitating recovery of phosphorylation stoichiometry information.

10

Contextualised real-time mass spectrometry improves glycosylation detection and characterisation

Kelly, M. I.; Ashwood, C.

2026-07-03 biochemistry 10.64898/2026.07.03.736344 medRxiv

Top 0.3%

8.1%

Show abstract

Glycosylation is a structurally diverse, non-template-driven modification whose analysis by liquid chromatography-mass spectrometry is constrained by discovery-mode acquisition rules developed for proteomics. Data-dependent acquisition filters, such as intensity-based precursor selection and charge-state exclusion, map poorly onto glycan analysis, which span wide ranges of charge state and abundance independent of their biological importance. Here we present glycosylation real-time mass spectrometry (GlycoRTMS), an instrument-API method that annotates observed precursor masses with glycan compositions in real time and uses this context to guide fragmentation. Composition-aware precursor prioritisation sampled deeper into the precursor space, expanding MS2 coverage of a hyaluronic acid hydrolysate from four to eight oligosaccharide subunits. Charge-state-specific collision energy equations tailored to oligosaccharides produced complete fragment ladders where fixed normalised collision energy did not. MS3 triggering gated by both diagnostic ions and glycan composition matching enabled efficient, chromatography-compatible characterisation of O-acetylated sialic acids and identified product ions specific to O-acetylation. Together, these strategies improve both the depth and quality of glycan detection and characterisation within a single injection.

11

Defining Quality Control Standards for Single-Cell Proteomics by Inter-Laboratory Benchmarking

van Puyenbroeck, S.; Claeys, T.; Seth, A.; Rijal, J.-B.; Keller, C.; Lin, L.; Mayer, R.; Matzinger, M.; Han, I.; Aragon Fernandez, P.; Petrosius, V.; Boyle, B.; Rivera, K.; Tourniaire, G.; Rosenberger, F. A.; Martens, L.; Carr, S. A.; Dong, Z.; Vegvari, A.; Carapito, C.; Kelly, R.; Mechtler, K.; Budnik, B.; Schoof, E. M.; Ctortecka, C.

2026-07-14 biochemistry 10.64898/2026.07.13.738155 medRxiv

Top 0.3%

8.1%

Show abstract

Single-cell proteomics can quantify thousands of proteins from individual mammalian cells, yet the absence of community-wide quality control limits biological interpretability. Here, the HUPO Single Cell Initiative presents the first inter-laboratory single-cell proteomics benchmarking study across seven laboratories using standardized 384-well plates acquired on Orbitrap Astral and timsTOF Ultra2 instruments. Centralized analysis across six DIA software tools revealed that software choice impacts identification depth and quantitative accuracy more than instrument vendor. Multi-layered quality control enabled the detection of cell-leakage during sorting, LC misconfiguration, column degradation and site-specific pipetting failures. Inter-lab quantitative correlations were strongest between instruments of the same vendor relative to cross-platform comparisons. Sequential correction for plate identity and well position recovered clean cell-type separation for confident downstream differential expression analysis. This study provides a data-driven quality control framework spanning plate design to batch correction for reproducible single-cell proteomics across laboratories and platforms.

12

High-Speed Mass Spectrometers diminish the difference between Data-Dependent and Data-Independent Acquisition Proteomics

O'Sullivan, N.; Bayer, F. P.; Mogler, C.; Kuster, B.

2026-05-28 biochemistry 10.64898/2026.05.26.727836 medRxiv

Top 0.3%

7.9%

Show abstract

Data-dependent acquisition mass spectrometry (DDA-MS) and data-independent acquisition mass spectrometry (DIA-MS) have historically offered complementary strengths in bottom-up proteomics, with DDA providing high-selectivity spectra for post-translational modification (PTM) analysis and DIA enabling more systematic peptide sampling. Here, we asked if this is still the case for the Orbitrap Astral platform that offers high-speed DDA and (ultra-) narrow-window DIA (nDIA) capabilities across proteome and phosphoproteome applications. When DDA and DIA measurements were parameter-matched (to the extent possible), the differences in analytical performance diminished markedly. Across extensive replicate analyses, both methods continued to identify new peptides and proteins without reaching saturation, indicating that the molecular complexity of biological samples still overwhelms even the fastest liquid chromatography-MS (LC-MS) methods. Incomplete sampling also contributed to substantial peptide-level non-overlap between DDA and nDIA and data completeness was only modestly better for nDIA than DDA across many replicates. Quantitatively, DDA and nDIA showed broadly similar precision and accuracy, with nDIA offering slightly higher precision and DDA slightly better accuracy in controlled mixture experiments. MS1-based quantification outperformed MS2-based quantification, particularly for short gradients, supporting MS1 quantification as a robust and general strategy for high-throughput proteomics. In phosphoproteomic samples, DDA and nDIA identified similar numbers of phosphopeptides, but DDA retained a small edge for phosphorylation site localisation. Together, the results show that advances in acquisition speed and sensitivity are narrowing the historical gap between DDA and DIA, while also revealing that current LC-MS workflows remain far from providing comprehensive proteome coverage. Going forward, further gains in dynamic range, scan speed, sensitivity, and transparent software tools will be required to reach systematic, comprehensive and reliable measurements of complex proteomes in a single shot.

13

The plant circadian clock exerts stronger control over the diel proteome than the transcriptome

Mehta, D.; Talasila, M.; Lau, Z. X.; Rodriguez Gallo, M. C.; Li, Q.; Zhong, Y.; Muzumdar, S.; Li, R.; Luo, W. J.; Lau, V.; Pasha, A.; Lock, S.; Ezer, D.; Provart, N. J.; Uhrig, R. G.

2026-06-05 plant biology 10.64898/2025.12.19.695194 medRxiv

Top 0.3%

7.9%

Show abstract

AbstractThe plant circadian clock is a genetic circuit composed of multiple mutually-regulating transcription factors that together synchronize internal biological rhythms to the [~]24-hour period of planetary rotation. While it has been known for over a decade that nearly 40% of the transcriptome in the model plant Arabidopsis oscillates with a circadian rhythm, it is yet unclear to what extent this translates to the proteome. Here, through parallel quantitative proteome and transcriptome time-course profiling of Arabidopsis wild-type plants and a panel of clock deficient plant lines, we show that specific clock genes exercise extensive control over diel proteome rhythmicity, and to a much greater extent than they do the transcriptome. This control results in a clock-dependent synchronization of rhythmic proteins along a bimodal phase distribution that is lost in circadian clock deficient plants. This suggests pervasive post-translational control of gene expression by specific elements of the circadian system, notably the morning expressed LHY/CCA1 module. Our findings imply that the circadian clock exercises much greater control of gene expression through proteostasis mechanisms than previously recognised, necessitating a recalibration of our current understanding of clock proteins as primarily transcriptional regulators.

14

Systematic optimization and benchmarking of synchro-PASEF for high-throughput phosphoproteome profiling

Brademan, D.; Mullarkey, A.; Greeson, M.; Szvetecz, S.; Vitek, O.; Blythe, E.; Huttenhain, R.

2026-06-27 biochemistry 10.64898/2026.06.26.734570 medRxiv

Top 0.3%

7.7%

Show abstract

High-throughput data-independent acquisition (DIA) workflows paired with short chromatographic separations are increasingly adopted for systems biology and clinical proteomics. However, narrower peak widths from rapid separations demand faster mass spectrometer cycle times to maintain quantitative depth and reproducibility. The synchro-PASEF acquisition mode on timsTOF mass spectrometers diagonally scans across ion mobility and m/z space, enabling efficient sampling of the precursor ion cloud with shortened cycle times. While synchro-PASEF has demonstrated competitive identification depth for global protein abundance samples compared to conventional dia-PASEF, its performance for phosphoproteomics - where the precursor ion cloud is characteristically broader and bimodally distributed - has not been evaluated. Here, we systematically optimized synchro-PASEF methods for phosphoproteomics and benchmarked performance against two dia-PASEF methods across three sub-hour separations. We found that synchro-PASEF performance depends critically on balancing diagonal window number, total isolation width, and gradient length, with longer gradients favoring more windows for selectivity and shorter gradients favoring fewer windows to preserve sampling frequency. An optimized configuration quantified over 19,000 localized phosphosites using a 23-minute separation. Retention time summation (RTsum) with a factor of 2 increased phosphopeptide identifications by 5-20% and reduced phosphosite-level coefficients of variation by up to 30% across all dia-PASEF and synchro-PASEF methods tested. Using {beta}2-adrenergic receptor (B2AR) activation as a signaling model, we demonstrate that label-free DIA phosphoproteomics can be used to model phosphoproteomics dose-response relationships, showing that synchro-PASEF and dia-PASEF produce highly concordant phosphoproteomic responses, with comparable numbers of responding phosphosites, similar effect sizes, and nearly identical predicted protein kinase A (PKA) substrates downstream of the activated B2AR. While synchro-PASEF did not surpass optimized dia-PASEF in identification depth, its comparable biological performance and amenability to post-acquisition optimization through RTsum support its utility for high-throughput phosphoproteomics. This work provides a transferable framework for synchro-PASEF method optimization and demonstrates the broad utility of retention time summation for PASEF-based phosphoproteomics workflows.

15

Stoichiometry-dependent specificity in biotin enrichment: a benchmarking framework for proximity labeling proteomics

Zala, C. A.; Trueba Sanchez, M. C.; van den Bor, J.; Willemsens, T.; Verweij, F. J.; Altelaar, M.; Stecker, K.

2026-05-11 molecular biology 10.64898/2026.05.07.723439 medRxiv

Top 0.3%

7.1%

Show abstract

Proximity labeling methods (including, BioID, TurboID, ultraID), along with surface proteomics and microdomain mapping, enable proteome-wide identification of spatially proximal proteins via MS-based analysis. These workflows require specific enrichment of biotinylated proteins using affinity purification, yet enrichment specificity can often be compromised by non-specifically bound proteins. As labeling strategies are increasingly applied to complex biological samples with low protein input or low biotin stoichiometry, accurately distinguishing true targets from background becomes a major analytical challenge. Despite its critical impact on data quality and interpretation, the influence of biotinylation level and protein input on enrichment performance remains poorly characterized, limiting the reliability of proximity labeling experiments. To address this, we establish a quantitative benchmarking framework that systematically evaluates biotin enrichment under controlled conditions, including scenarios of low biotin stoichiometry. Using this setup, we show that enrichment specificity strongly depends on biotin stoichiometry: higher levels of biotinylation in samples yield high specificity, whereas low biotinylation increases non-specific background. Reduced protein input further limits recovery of true targets, yet maintains enrichment specificity, highlighting sensitivity constraints of enrichment-based workflows. We apply this framework to biotinylated extracellular vesicle (EV) cargo uptake in recipient cells using ultraID-CD63 labeling. Detection of the most abundant EV cargo proteins under low biotinylation conditions indicates that current workflows approach the lower bounds of biotin enrichment sensitivity. Together, these standards provide a practical reference for evaluating and optimizing biotin enrichment workflows, supporting quantitative and reproducible proximity labeling in proteomics.

16

Top-down Sequencing of Intact Proteoforms using the timsOmni mass spectrometer: Accurate Determination of Co-occurring Histone Modifications

Berthias, F.; Bilgin, N.; Smyrnakis, A.; Le Boiteux, E.; Kosmopoulou, M.; Albers, C.; Suckau, D.; Mecinovic, J.; Papanastasiou, D.; Jensen, O. N.

2026-05-05 biochemistry 10.64898/2026.05.01.722147 medRxiv

Top 0.3%

6.8%

Show abstract

Deep characterization of intact proteoforms remains an analytical challenge in functional proteomics, particularly for heterogenous multi-site post-translational modifications at distinct amino acid residues. Histones are among the most dynamically and diversely post-translationally modified proteins in eukaryote cells, carrying multiple, co-occurring and reversible modifications that can give rise to isomeric proteoform species. Tandem mass spectrometry with multimodal fragmentation capabilities is a promising approach for deep characterization of intact proteoforms, such as modified histones. We applied the novel timsOmni mass spectrometer, which incorporates the Omnitrap platform enabling multimodal MS workflows, for residue-level mapping of histone modifications, including acetylation and methylation. Recombinant histones H3.1 and H4 were in vitro acetylated by enzymes GCN5, PCAF and p300 to generate mono- and multi-acetylated proteoforms. Complementary MS2 electron- and collision-based dissociation (ECD, EID, RCID and ECciD), together with MS3 strategies, produced complete or near-complete backbone fragmentation of intact protein ions (>92% amino acid sequence coverage). For monoacetylated species generated by the more site-selective lysine acetyltransferases, the dominant proteoform matched the known catalytic preferences of the enzymes (H3.1K14ac for GCN5 and PCAF, and H4K8ac for PCAF), while minor positional isomers were also identified and their relative abundance estimated. In contrast, the broader substrate specificity of p300 produced a wide distribution of H4 proteoforms bearing up to seven acetylated lysine residues. Species carrying six and seven acetylations were characterized by multimodal MS2/MS3 experiments, enabling localization of individual acetylation sites and discrimination of positional isomers. Finally, endogenous histone proteoforms from liver extracts were analyzed, yielding sequence coverages of 92-93% for the most abundant species and enabling confident localization of multiple PTMs (acetylation and methylation). These results illustrate that multimodal MSn fragmentation of intact proteins supports residue-level assignment of combinatorial histone marks and coexisting positional isomers. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=165 HEIGHT=200 SRC="FIGDIR/small/722147v1_ufig1.gif" ALT="Figure 1"> View larger version (34K): org.highwire.dtl.DTLVardef@387ab5org.highwire.dtl.DTLVardef@2410org.highwire.dtl.DTLVardef@13fc392org.highwire.dtl.DTLVardef@140e054_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIMultimodal MS{superscript 2}/MS3 maps histone PTMs on intact proteins. C_LIO_LIECD, EID, RCID, and ECciD provide complete or near-complete sequence coverage. C_LIO_LIMS3 localizes acetylation sites, distinguishes positional isomers. C_LIO_LIEndogenous H4 proteoforms are assigned with site-specific PTM mapping. C_LI

17

Adaptive Focused Acoustics-integrated proteome profiling of macrophages uncovers low abundant proteins associated with immune homeostasis, inflammatory response, and transport

McAlister, J. A.; Woods, M.; Abarzua, L.; Vasantgadkar, S.; Bhattacharyya, D.; Geddes-McAlister, J.

2026-05-28 immunology 10.64898/2026.05.26.728044 medRxiv

Top 0.3%

6.8%

Show abstract

Efficient and reproducible protein extraction is a critical step in mass spectrometry-based proteomics workflows, particularly for complex host-pathogen systems where low-abundance immune-associated proteins are difficult to detect. Probe sonication methods used for cell lysis requiring mitigation of excessive heat generation, to prevent degradation of biologically important proteins, while also limiting throughput and potentially introducing sample-to-sample variability. In this study, we evaluated adaptive focused acoustics (AFA) technology as an alternative approach for macrophage lysis and protein extraction and digestion within a standard proteomics workflow coupled with mass spectrometry. We observed that AFA technology reduced hands-on processing times and overall workflow timelines and single-sample AFA technology improves proteome coverage, dynamic range, and reproducibility. We also evaluated multiplexed AFA technology for lysis, and we observed an exclusive macrophage proteome and influence on replicate reproducibility and dynamic range detection for low abundant proteins. Moreover, multiplexed AFA technology for macrophage lysis and digestion further increased protein identifications, replicate reproducibility, and dynamic range. Considering the AFA-exclusive proteome, 86 proteins were detected across all AFA-based lysis and digestion methods, including low-abundance proteins associated with macrophage homeostasis, inflammatory response, and transport. Together, these findings demonstrate that AFA technology enhances reproducibility, throughput, and proteome depth for macrophage protein extraction while enabling the detection of biologically relevant low-abundance immune-associated proteins. These improvements provide a strong foundation for future investigation of host-pathogen infection models, where pathogen-derived proteins remain challenging to detect within complex host proteomes.

18

PeptiDIA: A Machine Learning Framework for Enhanced Peptide Identification in Fast-Gradient Data-Independent Acquisition Proteomics

Ortona, J.; Leclercq, M.; Roux-Dalvai, F.; Routy, B.; Bonnet, S.; Droit, A.

2026-06-12 bioinformatics 10.64898/2026.06.10.731224 medRxiv

Top 0.3%

6.3%

Show abstract

Data-independent acquisition (DIA) mass spectrometry has become increasingly prevalent in proteomics as advances in instrumentation, chromatography, and computational analysis have enabled robust proteome identification across complex biological samples. However, analytical depth achieved with fast chromatographic gradients remains lower than that obtained using long-gradients, reflecting a throughput-depth trade-off. Here, we present PeptiDIA, a machine learning framework that enhances peptide identification in fast-gradient DIA data by leveraging paired fast and long-gradient acquisitions from identical samples. PeptiDIA processes DIA-NN outputs generated at relaxed false discovery rate thresholds to obtain expanded candidate peptide pools and trains gradient-boosted decision tree models using long-gradient identifications as reference labels. The model integrates DIA-NN features with engineered peptide descriptors and applies isotonic regression to calibrate probabilities, enabling controlled peptide recovery relative to the long-gradient reference. Applied to human and murine datasets spanning six tissues acquired on an Orbitrap Exploris 480, PeptiDIA increased peptide identifications by 25-34% at 1% target reference-discordance rate (RDR) and increased the number of protein groups containing at least one rescued peptide by 15-17%. Overall, PeptiDIA improves the identification depth of fast-gradient DIA-NN workflows without altering acquisition strategies. The framework is available as a web application and command-line tool at https://github.com/Jordano700/PeptiDIA.

19

Near-Zero Missed Cleavages with a High-Fidelity Recombinant Arg-C Zero for Mass Spectrometry-Based Proteomics

Hernandez-Rollan, C.; Elsborg, J. D.; Le Boiteux, E.; Lu, Y.; Patel, K.; Ahel, I.; Jensen, O. N.; Batth, T. S.; Olsen, J. V.

2026-05-28 biochemistry 10.64898/2026.05.28.728370 medRxiv

Top 0.4%

6.2%

Show abstract

Proteolytic digestion remains a critical step in bottom-up proteomics workflows, with enzyme specificity and efficiency directly impacting peptide identification and protein sequence coverage. Here, we present the comprehensive characterization of Arg-C Zero, a recombinant arginyl endopeptidase derived from Porphyromonas gingivalis that exhibits exceptional fidelity in cleaving specifically at the C-terminus of arginine residues. Unlike conventional serine proteases such as Trypsin, Arg-C Zero utilizes a histidine-cysteine catalytic dyad mechanism, achieving near-zero missed cleavage rates (>99% efficiency) under standard proteomics conditions. Through systematic evaluation using HeLa protein extracts, we demonstrate that Arg-C Zero maintains consistent performance across varying digestion times. The enzyme shows robust activity across a broad pH range and tolerates up to 4M urea, making it ideally suitable for a diverse range of proteomics sample preparation workflows. While Trypsin/LysC combinations remain superior for comprehensive proteome coverage, Arg-C Zero offers unique advantages for applications requiring high specificity and reproducible arginine-specific cleavage patterns, particularly for analysis of post-translational modifications (PTMs). Here, we demonstrate how Arg-C Zero aids comprehensive mapping of histone PTMs, and when used in low-pH workflows help preserve labile ADP-ribosylation sites, expanding the analytical capabilities of mass spectrometry for characterizing these challenging modifications. The enzymes resistance to proline-adjacent cleavage sites and compatibility with standard mass spectrometry buffers position it as a valuable addition to the proteomics enzyme toolkit.

20

Fibroblast growth factor receptor substrate 2 interactome mapping reveals novel candidate interactors associated with migration and invasion

Kopp, L. L.; Ciraulo, B.; Hochuli, D.; Versamento, D.; Baumgartner, M.

2026-05-10 cancer biology 10.1101/2025.09.23.678042 medRxiv

Top 0.4%

6.2%

Show abstract

The scaffold protein FRS2 is central to FGFR signaling, linking receptor activation to MAPK/ERK and PI3K/AKT pathways. Elevated FRS2 expression correlates with aggressive tumor phenotypes and poor prognosis across multiple cancers, including the pediatric cerebellar tumor medulloblastoma (MB). Here, we characterized FRS2s subcellular localization and interactome in MB cells, employing live-cell imaging, phosphoproteomics, immunoprecipitation, and APEX2-based proximity labeling. We found that increased FRS2 expression is associated with increased motile and invasive behavior in MB tumor cells. We furthermore identified novel candidate FRS2-associated proteins involved in actin cytoskeleton remodeling, cell junction assembly, and translation initiation, which indicate a growth factor-dependent reorganization of the FRS2 signalosome. Our data furthermore indicate a regulatory role of FRS2 in directing subcellular distribution of the cell junction and cell motility regulator TJP1. Our findings highlight the relevance of FRS2 as a mediator of cell motility and invasiveness and provide candidate proteins associated with FRS2 that are involved in cellular processes governing migration and invasion. This study thus provides a framework for exploring the FRS2 interactome as a possible target to attenuate FGFR-driven oncogenic processes with next-generation therapeutic strategies.